Avoiding ascertainment bias in 1 the maximum likelihood inference of 2 phylogenies based on truncated data

نویسندگان

  • Asif Tamuri
  • Nick Goldman
چکیده

1 Some phylogenetic datasets omit data matrix positions at which all taxa share the same state. 2 For sequence data this may be because of a focus on single nucleotide polymorphisms (SNPs) 3 or the use of a technique such as restriction site-associated DNA sequencing (RADseq) that 4 concentrates attention onto regions of differences. With morphological data, it is common to 5 omit states that show no variation across the data studied. It is already known that failing to 6 correct for the ascertainment bias of omitting constant positions can lead to overestimates of 7 evolutionary divergence, as the lack of constant sites is explained as high divergence rather 8 than as a deliberate data selection technique. Previous approaches to using corrections to the 9 likelihood function in order to avoid ascertainment bias have either required knowledge of 10 the omitted positions, or have modified the likelihood function to reflect the omitted data. In 11 this paper we indicate that the technique used to date for this latter approach is a conditional 12 maximum likelihood (CML) method. An alternative approach — unconditional maximum 13 likelihood (UML) — is also possible. We investigate the performance of CML and UML 14 and find them to have almost identical performance in the phylogenetic SNP dataset context. 15 We also make some observations about the nucleotide frequencies observed in SNP datasets, 16 indicating that these can differ systematically from the overall equilibrium base frequencies of 17 the substitution process. This suggests that model parameters representing base frequencies 18 should be estimated by maximum likelihood, and not by empirical (counting) methods. 19

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimation of Parameters for an Extended Generalized Half Logistic Distribution Based on Complete and Censored Data

This paper considers an Extended Generalized Half Logistic distribution. We derive some properties of this distribution and then we discuss estimation of the distribution parameters by the methods of moments, maximum likelihood and the new method of minimum spacing distance estimator based on complete data. Also, maximum likelihood equations for estimating the parameters based on Type-I and Typ...

متن کامل

Inference for the Type-II Generalized Logistic Distribution with Progressive Hybrid Censoring

This article presents the analysis of the Type-II hybrid progressively censored data when the lifetime distributions of the items follow Type-II generalized logistic distribution. Maximum likelihood estimators (MLEs) are investigated for estimating the location and scale parameters. It is observed that the MLEs can not be obtained in explicit forms. We provide the approximate maximum likelihood...

متن کامل

Maximum Likelihood Estimation of Parameters in Generalized Functional Linear Model

Sometimes, in practice, data are a function of another variable, which is called functional data. If the scalar response variable is categorical or discrete, and the covariates are functional, then a generalized functional linear model is used to analyze this type of data. In this paper, a truncated generalized functional linear model is studied and a maximum likelihood approach is used to esti...

متن کامل

Inference on Pr(X > Y ) Based on Record Values From the Power Hazard Rate Distribution

In this article, we consider the problem of estimating the stress-strength reliability $Pr (X > Y)$ based on upper record values when $X$ and $Y$ are two independent but not identically distributed random variables from the power hazard rate distribution with common scale parameter $k$. When the parameter $k$ is known, the maximum likelihood estimator (MLE), the approximate Bayes estimator and ...

متن کامل

The discovery of single-nucleotide polymorphisms--and inferences about human demographic history.

A method of historical inference that accounts for ascertainment bias is developed and applied to single-nucleotide polymorphism (SNP) data in humans. The data consist of 84 short fragments of the genome that were selected, from three recent SNP surveys, to contain at least two polymorphisms in their respective ascertainment samples and that were then fully resequenced in 47 globally distribute...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017